Performance Improvement by Overlapping Computation and Communication on SMP Clusters
نویسندگان
چکیده
Clusters of SMPs (Symmetric Mul-tiprocessor Systems) have emerged as important platforms for high performance computing. As a programming scheme on SMP clusters, we proposed hybrid shared memory/distributed memory programming. In order to tolerate inter-node communication , we overlapped the inter-node communication and computation using remote memory based user-level communication primitives NICAM, which we designed on our SMP cluster COMPaS. In this paper, we report on the programming and its performance of COMPaS by overlapping communication and computation. Our experimental results show that communication is almost hidden and the execution time is reduced 25% in the best case.
منابع مشابه
Non - Uniform Partitioning of Finite Di erence Methods Running on SMP Clusters
A multicomputer or workstation cluster with multiprocessor nodes introduces signiicant need and opportunity for overlapping communication with computation. We evaluate partitioning strategies for an important application class, nite diierence methods, running on clusters of symmetric multiprocessors. Our results show that even for a regular, uniform nite diierence method, a non-uniform partitio...
متن کاملAsynchronous Parallel Programming Model for SMP Clusters
Our study proposes a novel MPI-only parallel programming model with improved performance for SMP clusters. By rescheduling tasks in a typical flat MPI solution, our model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. This study achieves a significant performance improvement with a minimal programming effort...
متن کاملImproving Linpack Performance on SMP Clusters with Asynchronous MPI Programming
This study proposes asynchronous MPI, a simple and effective parallel programming model for SMP clusters, to reimplement the High PerformanceLinpack benchmark. The proposed model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. As a result, we can achieve significant improvements in performance with a minimal ...
متن کاملLarge Scientific Calculations on Dedicated Clusters of Workstations
The availability of high capacity, low latency interconnects enable clusters of workstations to form powerful multicomputers with good computing and memory capabilities. To achieve high performance with modern cache hierarchies, software must be written with the architecture in mind, e.g. reducing data movement for the data intensive application increase performance. Data movement becomes even ...
متن کاملLarge Scientific Calculations an Dedicated Clusters of Workstations
The availability of high capacity, low latency interconnects enable clusters of workstations to form powerful multicomputers with good computing and memory capabilities. To achieve high performance with modern cache hierarchies, software must be written with the architecture in mind, e.g. reducing data movement for the data intensive application increase performance. Data movement becomes even ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998